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Abstract 

It has recently been shown that in some DNA microarrays the time needed to reach thermal 
equilibrium may largely exceed the typical experimental time, which is about I5h in standard 
protocols (Hooyberghs et al. Phys. Rev. E 81, 012901 (2010)). In this paper we discuss 
how this breakdown of thermodynamic equilibrium could be detected in microarray experi- 
ments without resorting to real time hybridization data, which are difficult to implement in 
standard experimental conditions. The method is based on the analysis of the distribution of 
fluorescence intensities / from different spots for probes carrying base mismatches. In thermal 
equilibrium and at sufficiently low concentrations, log/ is expected to be linearly related to 
the hybridization free energy AG with a slope equal to l/RT exp , where r exp is the experimental 
temperature and R is the gas constant. The breakdown of equilibrium results in the deviation 
from this law. A model for hybridization kinetics explaining the observed experimental be- 
havior is discussed, the so-called 3-state model. It predicts that deviations from equilibrium 
yield a proportionality of log/ to AG/RT e ff. Here, T e s is an "effective" temperature, higher 
than the experimental one. This behavior is indeed observed in some experiments on Agilent 
arrays. 16 ' 20 We analyze experimental data from two other microarray platforms and discuss, 
on the basis of the results, the attainment of equilibrium in these cases. Interestingly, the same 
3-state model predicts a (dynamical) saturation of the signal at values below the expected one 
at equilibrium. 

Introduction 

While nucleic acid hybridization has been quantitatively characterized for strands binding in a 
bulk solution, 1 a microscopic-based understanding of hybridization in microarrays is still lacking. 
Compared to hybridization in solution, the characterization and understanding of physicochemical 
properties arising from hybridization in DNA microarrays is more challenging, and several groups 
dedicated their research to this subject 2-12 (see also the reviews in refs 13 ~ 15 ). 

There are two main reasons which make this task difficult. The first one arises due to differ- 



ences between different microarray platforms; for example the fabrication method, surface chem- 
istry, the length of the surface-tethered probe strands and their density might significantly differ 
from platform to platform. Since it is a priori rather difficult to assess the influence of the above- 
mentioned factors (for a theoretical analysis of factors influencing hybridization in microarrays see 
refs. 13 and 15 ), it is necessary to investigate the properties and characteristics of each platform sep- 
arately. The second reason arises due to the complexity of the samples (i.e., cell extract) which are 
used in most hybridization experiments. In these experiments, next to the intended hybridization 
between a probe and its complementary target in solution, a wide range of other reactions may 
occur: cross hybridization between partially complementary targets in solution, target folding, and 
cross hybridization between probes and partially complementary targets. Therefore, although a 
large number of biological data is available, it is difficult to disentangle from these experiments 
the properties of hybridization between targets and surface-bound probes from other spurious ef- 
fects. A better understanding of the microscopic properties of hybridization in DNA microarrays 
involves the analysis of well-controlled experiments with simple target solutions. 

In this context, we recall the outcome of recent experiments, 16 in which a solution containing a 
single target sequence at different concentrations was hybridized to a large number of mismatched 
probe sequences. It was shown that the general assumption of thermodynamic equilibrium, com- 
monly used when discussing experimental data, does not hold. Apart from using a simple hy- 
bridization target solution, these experiments were carried out at standard conditions with respect 
to the buffer, the temperature, and the hybridization times. The lack of thermal equilibrium was 
shown to cause a decrease in specificity of microarrays, 16 therefore it implies very practical con- 
sequences: it limits the performance of the microarrays below their maximal attainable level. 

The aim of this paper is to discuss the microscopic origins of the breakdown of thermal equi- 
librium and to investigate experiments on different microarray platforms where such effects might 
be of relevance. We start by reviewing the two-state Langmuir model of DNA hybridization and 
discuss a recent extension. The latter is termed three-state model and includes a long-lived par- 
tially hybridized state. This model offers a plausible explanation of the observed nonequilibrium 



behavior. We then turn our attention to three different platforms for which controlled hybridization 
experiments are available, and discuss evidence of the breakdown of equilibrium in these cases. 

Two- vs. Three-State Kinetic Model 

Hybridization in solution as well as in DNA microarrays is commonly described by a two-state 
process using the so-called Langmuir model (see, e.g., 15 for a review). According to the Langmuir 
model the fraction of hybridized probes, 0, is given by 

dB 

— = ck l (i-e)-k-iO, CD 

where k\ and k-\ are the on- and off -rates, and c is the target concentration in solution. The 
rates are linked to the equilibrium free energy AG via k\/k-\ = exp(—AG/RT), where T is the 
temperature, and R the gas constant. Note that by convention AG < 0, and strong target-probe 
affinities thus imply larger negative values for AG. The two-state hybridization model is assumed 
to be valid if the hybridizing sequences are sufficiently short such that intermediate states can 
be ignored. In solution, the zippering of two hybridizing sequences is sufficiently rapid, and the 
two-state assumption is expected to be applicable to oligomers containing up to 30 nucleotides. 1 
By imposing dO/dt = on Eq. (1), we find the equilibrium value 



cki/k_i 
1 +ck\/k-\ 



e eq = , , 1 ; ,; , (2) 



which becomes 



eq « ck\/k-\ = cexp(-AG/#r), (3) 

in the limit ck\ <C k-\ (i.e. low concentration and/or weak binding). 

In this paper we consider for fixed target concentrations c 6 vs. AG plots (in this section), or 



equivalently, intensity / vs. AG (in section describing experiments). Equation (3) shows that far 
from chemical saturation (6 <C 1) the equilibrium isotherm is characterized by the linear relation- 
ship log 6 oc AG, with a slope equal to l/RT. The full solution of Eq. (1) is 

e(t) = e &q (i-e- t / T ), (4) 

where x is the characteristic time such that 

%~ l = ck\+k-\. (5) 

In order to understand the relaxation to equilibrium of sequences with different AG , we make 
a hypothesis concerning the rate constants. Experiments for hybridization in solution and in mi- 
croarrays show that k\ only weakly depends on the sequence composition. 17 ' 18 Typically, there is 
a much stronger sequence dependence of the off -rate k-\. We thus approximate 

k\ = a, k-\ = aexp(AG/i?r), (6) 

where a is sequence independent. Figure 1(a) shows vs. AG for a fixed concentration c (see 
Eq. (4)) on a semilogarithmic scale at five different hybridization times (dashed lines), where 
the rates are chosen according to Eqs. (6). The solid line coincides with the equilibrium regime 
(limiting case t — > oo). The dashed lines and the equilibrium isotherm merge for small binding 
affinities. This indicates that weakly-bound sequences equilibrate faster than more strongly bound 
ones. This behavior is a consequence of Eqs. (5) and (6): The relaxation time decreases for weak 
binding. In the limiting case of strong binding (AG far from zero i.e. large x) and short times, the 
expansion of Eq. (4) yields 9{t) « act which no longer depends on AG. 
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Figure 1: Plot of log0 vs. AG for the two- and three-state models, (a) Two-state model 
with c = 5pM, a = 10 4 ,s _1 M _1 , and RT = 0. 67 'kcal.mol . The curve with solid line is 
the equilibrium isotherm (Eq. (2)), the other lines correspond to plots of 6(t) (Eq. (4)) for 
t = {4, 17, 86, 700; 4000}/? (bottom to top), (b) Three-state model for c = 5pM, a = 10 5 j _1 Af _1 , 
CO = Is -1 , RT = 0. 67 'kcal.mol , and 7= 1/3 : Sum of the solutions of Eqs. (7a) and (7b) 
(6 — Q\ + O2) vs. AG for different t (same as in (a)) with equilibrium value of 9 (solid line). An ef- 
fective temperature T e g > r exp can be defined for the intermediate regime (solid straight lines), (c) 
Three-state model: Fixed hybridization time t = 17 'h and different concentrations varying between 
5pM and 500nM by steps of a factor 10 (from bottom to top). Remaining parameters identical 
to those in (b). System equilibrated for the largest concentration. The intermediate regime with 
an effective temperature T e & can also be observed (solid straight lines). Inset: vertical axis shows 
fraction of hybridized probes divided by the concentration, i.e. 61c. 
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We refer to this as the dynamical saturation regime which appears as a constant limiting be- 
havior in the vs. AG plot as shown in Fig. 1(a). Instead, we refer to chemical saturation as the 
limit 6 — >■ 1, where all available probes are hybridized, and no further hybridization is possible. 
We emphasize that the existence of the dynamical saturation limit is linked to the choice of the 
rates k\, k-\ (see Eqs. (6)), where we assumed that k\ is sequence independent. 

In view of the subsequent discussion of experimental data, we review a recently introduced 
extension of the two-state model. 16 This extension includes an intermediate long-lived configura- 
tion of partially hybridized molecules as sketched in Fig. 2. The presence of the intermediate state 
slows down hybridization to a fully formed duplex due to the interaction of the target molecule 
with multiple probes. There are now four rate constants and the model is defined by two coupled 
linear equations 

1 "' = cki(l-e 1 -e 2 ) + k-202-(k-i+k 2 )d h (7a) 



dt 

d0 2 
dt 



k 2 e l -k_ 2 e 2 , (7b) 



where 0\ and 6 2 denote respectively the fractions of partially and fully hybridized states. The ratios 
between the forward and backward rates are fixed by thermodynamics 

A. = e-AG'/RT^ (ga) 

k-\ 

k^ =e -(AG-AG')/RT^ (8b) 

k- 2 

where AG and AG' are the hybridization free energies of the fully and partially hybridized states, 
respectively (both are free energy differences with respect to the unhybridized state). In general, 9\ 
describes a distribution of partially hybridized states where only a fraction of the target-nucleotides 
is bound to a probe and the associated free-energy AG' can be considered as an effective AG. We 
expect the fully hybridized state to be energetically more favorable than the partially hybridized 




Figure 2: Sketch of the three-state model for hybridization in DNA microarrays. 9\ and O2 are the 
fractions of partially and fully hybridized strands respectively. This model is specified by the four 
rate constants. 

state, i.e. AG < AG'. The forward reaction rates k\ = a and k^ = CO are assumed to be sequence 
independent while the reverse rates are fixed by the relations in Eqs. (8). We then assume a mono- 
tonic link between AG and AG'. Consider two sequences with different total hybridization free 
energies where the first one has higher binding affinity than the second AGi < AG2 (Note that this 
may refer to the same probe with one being a perfect match and the other a mismatching sequence). 
For the partially hybridized states, we then expect AG[ < AG' 2 . To establish a relationship between 
the two free energies, we make the simple assumption 

AG' « yAG, (9) 

where y < 1 since AG < AG'. With this input we numerically solve Eqs. (7a) and (7b). The solution 
can be expressed in terms of eigenvalues and eigenvectors of a 2 x 2 matrix. Figure 1(b) plots the 
solution for the fraction of hybridized probes G — 9\ + 62 for five different times (dashed lines). 
The solid line corresponds to the equilibrium isotherm, which is identical to that of the two-state 
model in Fig. 1(a). In addition, we find a regime of dynamical saturation for very strongly bound 
sequences, which is analogous to that of the two-state model. The difference between the two 
models is the emergence of a new nonequilibrium regime. There, the logarithm of the fraction 
of hybridized probes varies linearly with AG with a slope equal to j/RT , where the parameter y 
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Table 1: Overview of experimental details of the three platforms analyzed in this paper. 





Agilent 


Affymetrix 


Codelink 


Temperature (°C) 


55,65 


45 


45 


Hybridization time (hours) 


17,86 


16 


12 


Target length (nucleotides) 


25,30 


14-25 


70 


Probe length (nucleotides) 


25,30 


25 


30 


Number of different targets 


1 


150 


8 


Target concentration (picomolar) 


50, 500, 5000 


1.4. 10- 3 -1.4 -10 3 


3000 



is defined in Eq. (9). We note that a regime characterized by a slope smaller than expected from 
the equilibrium isotherm, is equivalent to introducing an effective temperature T e ff = T sxp /y higher 
than the experimental one. 

We conclude this section with a discussion of the dependence of 9 on the target concentration. 
Figures 1(a) and (b) show 9 at a fixed concentration with varying hybridization times. Let us now 
consider a protocol in which the concentration is varied and the time fixed. In all cases far from the 
chemical saturation limit 9 <^ 1, the data scale linearly with the concentration in the two as well 
as the three-state model. This behavior is a consequence of the form of Eqs. (1) and (7a): far from 
chemical saturation we approximate ck\{\ — 9) ~ ck\ in Eq. (1), and ck\{\ — 9\ — 62) ~ ck\ in 
Eq. (7a), such that the concentration c only enters in the source term of the differential equations in 
the two models. As a result 9 is proportional to c far from chemical saturation. This is illustrated 
in Fig. 1(c) where the hybridization time is fixed to t = 17/?. The concentration values range from 
c = 5pM to c = 500nM in increments of a factor 10 (from bottom to top in the main graph). Only 
at the highest concentration the system has reached equilibrium after t = \lh (solid line), at all 
other concentrations the system is not equilibrated. This is a consequence of a dependence of the 
equilibration time on c (see Eq. (5) for the two-state model). The inset of Fig. 1(c) shows the 
intensity divided by the concentration. For the two lowest concentrations the 9/c plots overlap for 
the whole range of AG, while there is no overlap if 9 is close to 1. 



Experiments 

In this Section we discuss the experimental results obtained by different groups on three different 
microarray platforms. The details of the experimental setups are summarized in Table 1. These 
experiments are performed on samples containing a controlled number of different sequences in 
solution at some fixed concentrations. Agilent arrays held one, Codelink 8 and Affymetrix 150 
different target sequences. In all cases the target hybridizes to a perfect match probe and to a 
number of mismatching probes. The emphasis of the analysis lies on the behavior of the signal 
intensities as a function of hybridization free energies AG. To estimate these free energies we 
use the nearest-neighbor model, 1 according to which AG can be written as a sum of parameters 
depending on the identity and orientation of neighboring nucleotides. Nearest- neighbor parameters 
from experiments on hybridizing strands in bulk solutions have been intensively investigated in the 
past and are available in the literature (see e.g. 19 ). As a first approximation, these parameters 
could also be used for hybridization in microarray s. In controlled experiments the parameters can 
also be extracted from microarray data, see for instance Ref. 20 In the following AG so i refers to 
estimated free energies from experiments in bulk solution (data from 19 ) and AG^array to estimated 
free energies from microarray data. 

Custom Agilent arrays 

We review some experimental results recently obtained by two of us 16 ' 20 and present novel exper- 
imental data of custom Agilent arrays. In this setup only one sequence is present in solution at 
various concentrations. The microarray is then designed to contain probes with either one, two or 
no mismatches with respect to the target sequence in solution. In total there are about 10 3 different 
probe sequences, replicated 15 times on the array (for the details see Ref. 20 ). The experimental 
data are shown in Fig. 3, which reproduces a pairwise comparison of the logarithm of the same 
intensities vs. AG so i (left column) and vs. AG^array (right column) for different target lengths and 
hybridization times. 
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Figure 3: Left column: I vs. AG. TO /, right column: I vs. AAG^array obtained under different 
experimental conditions in an Agilent platform. The full lines in (a,c,e) are the Langmuir isotherm 
with r exp , as expected at equilibrium. The dashed line in (a) is the Langmuir isotherm with T e ff = 
3r exp (intermediate regime). The dashed lines (b,d,f) are fitted using the three-state model with 
7= 1/3, k[ = 5.10' 3 M~ l s~ 1 , &2 = l-> , and we introduce a shift of 1 .5kcal/mol on the AG axis. 
The amplitudes are slightly adjusted to match the intensity scale in the different experiments. (a,b): 
Data collapse using AAG^an-ay of 20 {right) better than for parameters obtained in solution {left). 
(c,d) Intermediate regime almost vanished since equilibration is reached for nearly all sequences. 
The effect is more pronounced for AAGjuarray than for AG 50 /. (e,f) Faster equilibration for shorter 
sequences: Compared to longer strands, the binding free energy here is smaller, resulting in a 
shorter characteristic time. (a,b): Data from. 20 (c,d): Data from. 16 (e,f): New experiment. 
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Here, AG so i is calculated based on nearest neighbor parameters by SantaLucia 19 at 1M [Na+] 
concentration. The hybridization buffer in the experiment is the standard Agilent buffer which 
contains a buffering agent, monovalent cation and a ionic surfactant. It is known that the salt con- 
centration influences the stability of the double helix and, thus also AG. However it usually affects 
all nearest- neighbor parameters by a constant salt-dependent constribution. 19 A change in salt con- 
centration (and presumably of other chemicals) would result in a global shift of the horizontal axis 
in Fig. 3. This, however, does not influence our analysis. More details about the effect of salt 
concentration and about electrostatic interactions are discussed in the Appendix 1. The AG^arrays 
are obtained from a linear least-square fit of the data, as explained in Ref. 20 The left column of 
Fig. 3 plots the data as function of AAG^an-ay = AG^array — AGpMjiiarray, i-e- the subtraction from 
the hybridization free energy of the perfect match probe, which cannot be determined from the 
fitting procedure. 20 The similarities of the plots in both columns of Fig. 3 show a correlation be- 
tween AG sc ,i and AG^array- 20 However, when plotted as a function of AAG^array the data "collapse" 
better into single master curves, compared to the plots as a function of the nearest-neighbor model 
parameters in solution which are more spread. 

Figures 3(a,b) show the measured intensities for a target strand of length L = 30 hybridizing for 
\lh, which is the typical hybridization time of the standard protocol used in biological experiments. 
In the cases (c,d) the time was increased to 86/?, while the cases (e,f) correspond to a shorter 
target sequence (L = 25) hybridized for 17/?. All data are obtained at a temperature of T — 65°C. 
Experiments were also repeated under the same conditions at different target concentrations; a 
change in global concentration leads to an overall multiplicative factor in the measured intensities 
as explained in the Appendix 2. 

The solid lines in Fig. 3(a,c,e) correspond to the slope expected in the case of thermodynamic 
equilibrium (l/RT exp ), the dashed line in Fig. 3(a) corresponds to the Langmuir isotherm with an 
effective temperature T e ff = 3T exp , whereas the dashed lines in (b,d,f) are obtained from the solution 
of the three-state model. The measured intensity is assumed to be proportional to the fraction of 
hybridized probes I =A6. The scale factor A is slightly adjusted in the dashed lines of Fig. 3(b,d,f) 
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to match for a global scale difference in intensities in the different experiments. Apart from the 
rescaling the dashed lines are obtained by using the same values for the adjustable parameters in 
the three- state model (the values are given in the caption). When increasing the hybridization time 
(from (b) to (d)) a larger fraction of the data tend to align along the expected equilibrium isotherm. 
Full equilibration is reached after I7h when using a shorted target length (f), consistent with the 
three-state model prediction: shorter sequences have lower binding affinity (lower —AG), and are 
therefore expected to equilibrate faster. 
,6 



5000pM 
500pM 




14 16 18 20 22 24 26 28 30 



-AG , (kcal/mol) 



30, T exp = 55°C and a hybridiza- 



Figure 4: / vs. AG for c = {50, 500, 5000}pM, L probe = L target 
tion time of 17/z. Lowering the temperature shifts the free energies to higher —AG's compared to 
Fig. 3(a). The dotted lines are a fit from the three-state model using the same parameters as Fig. 3. 
Note that the shape for the largest concentration is flattened due to the saturation of the scanner 
around 10 5 . 



Figure 4 shows experimental data for L = 30 and a hybridization time of \lh at a temperature of 
T = 55°C. Lowering the temperature leads to higher binding affinities (higher —AG), which causes 
an increase in the relaxation time according to the three-state model. We therefore expect that these 
data would be even further away from thermodynamic equilibrium. This is indeed experimentally 
confirmed. In the present case none of the data, even for the lowest intensities attains the limiting 
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behavior / oc e~ AG ' RT , which suggests a breakdown of thermal equilibrium. Note that we did 
not plot the data as function of AAG^an-ay since it is only possible to extract these parameters if 
equilibrium data are available, which is not the case at T = 55°C. The dashed lines in Fig. 4 are fits 
of the three-state model, with the same fitting parameters compared to the case T = 65°C (see figure 
caption). The data at the highest affinities tend to bend towards a flat asymptotic limiting behavior, 
which suggests that they approach the dynamical saturation limit, discussed in the previous section. 
For the sake of completeness, we mention that the signal of the highest concentration is even more 
flattened due to the saturation of the scanner around 10 5 . Note also that there is a linear scaling 
with the concentration and this is verified in a range of two orders of magnitude in c (from 50 to 
5000pM). 

Custom Affymetrix Microarray 

We discuss next the experiments by Suzuki et al. 21 In these experiments the arrays were custom- 
designed and synthesized on the Maskless Array Synthesizer platform with the Affymetrix Nim- 
bleExpress program. The target length was L = 25 nucleotides throughout the experiments. 150 
artificial target sequences with a CG-content ranging from 32% to 72% were used such as to match 
typical distribution of probes of the Affymetrix E. Coli array. The probe lengths varied from L = 14 
to L = 25 nucleotides. The probes were shortened from the 5'-end. This implies that hybridization 
between probes with L < 25 nucleotides and targets produced duplexes with dangling ends afar 
from the microarray surface. For each target sequence three corresponding PM sequences and all 
possible mismatch combinations were present on the surface, i.e. three mismatches per position 
along the strand. In order to avoid terminal effects (it is known that mismatches close to the he- 
lix ends have different stability properties compared to mismatches in the center of the helix 22 ), 
our analysis considers only those mismatches which are at least three nucleotides spaced from the 
ends. The concentration of each target sequence ranged from lAfM to lAnM incrementing in 
steps of factor ten; a concentration of a single target of lAnM corresponds to a total concentration 
of 1.4 x 150 = 2l0nM. The hybridization time was 16/z at T = 45°C. The data cover a broad range 
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of concentrations and AG, compared to the setup discussed in the previous section. 

Figure 5 shows a log/ vs. AG so i plot of the data for different values of the total concentration. 
A large number of data is available for each experiment (78300 data points including the PM's and 
single mismatches of the 150 different targets). To help the visualization of the data were binned 
in intervals [AG— 5/2, AG + 8/2] with 8 = 2kcal/mol. We then calculate the median value of 
the intensities within one AG interval of all probe sizes. The error bars in the graphs provide an 
estimate of the typical spreading of the data using the mean absolute deviation. 




20 25 

-AG (kcal/mol) 

Figure 5: / vs. AG curves for dataset 1 obtained by Suzuki et al. 21 Binding free energies AG 
determined by means of nearest neighbor model using the values obtained in solution. 19 Saturation 
occurs at different values for different concentrations. The dotted and solid lines correspond to the 
Langmuir isotherm using respectively r exp = 318^ and T e ff = S50K ps 2JT exp . 



For very weak binding (—AG < lOkcal/mol), the data of Fig. 5 tend to a constant background 
value. The background level increases as the concentration is increased, due to aspecific binding, 
as expected. The slope of the dashed line in Fig. 5 is l/RT exp . The solid line (which provides a 
good extrapolation of the low —AG data) corresponds to an "effective" temperature of T e ff = S50K 
(a value close to the one found in the analysis of the experiments in the Agilent platform). Thus, 
these experimental data cannot be reconciled with equilibrium thermodynamics. In addition, we 
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note that for strong binding affinities the data reach a saturation value. This behavior would be 
consistent to the dynamical saturation predicted by the kinetic models discussed in the section 
describing the two- and three- state models. 

In Fig. 6 we plot the three-state model isotherm (thick solides lines) corresponding to an hy- 
bridization time of 16/z as in the experiment and using the values for the parameters k\, &2 and 
7 given in the caption. The model fits well the data within two orders of magnitude in concen- 
tration. To fit the data in the low —AG limit we added a background value (To) : I = Iq + A6 
(where = 0\ + 62 is the total fraction of bound probes as calculated from the three-state model). 
Higher concentrations are also well fitted by the three-state model, provided the adjustment of 
the global fluorescence intensity level by rescaling the prefactor A. This rescaling is necessary 
since at high concentrations the intensities in Fig. 5 no longer scale linearly with the concen- 
tration, differently from what observed in all the experiments on Agilent arrays (see Fig. 4 and 
Refs. 16 ' 20 ). This behavior is to be expected close to chemical saturation 9 — ¥ 1, but not in the 
region \5kcal/mol < —AG < 20kcal/mol, where the measured intensities are very far from satu- 
ration. It suggests that other effects may play a role such as e.g. interactions (partial hybridization) 
between target molecules in solution. The experimental data of Fig. 5 have also been analyzed in 
the recent literature 23 ' 24 and were fitted with a model taking target depletion into account. Deple- 
tion arises when a relevant fraction of the target molecules in solution hybridizes to the microarray, 
so that during the experiment the concentration in solution decreases. Depletion would explain why 
the intensities at high binding affinity reach a saturation value below the chemical saturation, 23 ' 24 
without the need of invoking a nonequilibrium effect. However the depletion models of Refs. 23 ' 24 
did not addressed the differences between the experimental data and equilibrium thermodynam- 
ics in the low binding limit (—AG < 20kcal/mol in Fig. 5), whereas the nonequilibrium scenario 
developed within the three-state model could explain the experimental data in both regimes. 
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Figure 6: For three intermediate concentrations, the experimental data are well fit by the three-state 
model (solid thick lines). The parameters are: y = 0.374, k\ = 10 5 ,s _1 M _1 and &2 = ls~ l . The 
value are scaled by a factor A = 2 18 in order to match the experimental data. We have adjusted the 
value of the background Iq for each concentration. 



It remains to be proven that the observed behavior is due to breakdown of thermal equilibrium. 
One possibility would be to perform hybridization experiments at higher temperatures. According 
to the three-state model higher temperatures (recall that the experiments of Fig. 5 are at T = 45°C) 
would favor equilibrium due to a decrease in —AG. This is indeed observed in Agilent arrays (see 
Figs. 3, 4). We recall that other effects could also explain why the slope of the / vs. AG data is 
smaller than what expected from the Langmuir model. A recent study 28 showed that synthesis 
errors (i.e. the synthesis of incorrect nucleotide) along the probe may also influence this slope. 
At present, further experiments are necessary to clarify the origin the discrepancy between the 
Affymetrix data and the equilibrium Langmuir model. 

Finally, we used the experimental data to perform a linear least square fit to obtain estimates of 
the nearest neighbor parameters. The interval range was chosen such that log I <* AG, far from any 
saturation behavior. We chose the two largest concentrations c = \A0pM and c = lAnM, focusing 
on the hybridization of size L = 20 with a range of AG between l3.5kcal/mol and l9kcal/mol. In 
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this range log/ is approximately linear as a function of AG as seen in Fig. 5. The fitting procedure 
is similar to the one described in Ref., 20 and it is obtained through singular value decomposition 
from the minimization of a quadratic cost function. The procedure provides 58 nearest neighbor 
parameters, of which 10 refer to perfect match parameters and 48 to single mismatch parameters. 
Due to degeneracies, 25 some of these parameters are not unique. In order to compare them with 
the corresponding parameters in solution, we consider the combination: 



AAG 




T A C 

; ; ; ag 

A C g' 



'T A C* 



A T G 



\/i C Lr/ \/i C Lr / \ /i 1 Cr / 



which is instead unique. 20 Figure 7 shows a comparison between the parameters obtained from a 
fit of the microarray data compared to those obtained from hybridization/melting experiments in 
Ref. 19 The Pearson correlation coefficients are 0.784 for \40pM and 0.735 for lAnM, indicating 
a moderate correlation between the two sets. The inset shows a plot between parameters obtained 
from the \40pM experiment vs. those of the \AnM experiment. There is a strong correlation 
between these two sets (Pearson coefficient 0.983) confirming the robustness of the extraction pro- 
cedure of the parameters. However, note that this last analysis has to be considered with care. The 
validity of the estimation of the parameters in a regime where the use of an effective temperature 
is needed (and possibly a regime out-of-equilibrium) is still an open question. 

Codelink activated slides 

Last, we present another set of experiments on custom Codelink activated slides by Weckx et al. 
In these experiments the target solution contained 8 different sequences. Note that only four were 
considered in the analysis (since the other four are richer in AT and bind very weakly to their 
probes, having a signal just above the the background level). The target concentration, temperature 
and hybridization times were c = 3000pM, T = 45°C, and 12/z respectively. The target is a 70-mer, 
with a stretch of 30 nucleotides complementary to the probes and 40 nucleotide spacer terminating 
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Figure 7: Main graph: AAG^ a rray vs. AAG so i. The data of Suzuki et al 21 are fitted (probe size 
L = 20) using the linear least-square method in the interval —AG between 13.5 and \9kcal/mol 
for two concentrations \A0pM and lAnM. We have used the effective temperatures T e ff = 750K 
for \A0pM and T e ff = 960K for lAnM. The Pearson coefficient between the two axes is 0.784 
for 140pM and 0.735 for 1 AnM, showing a moderate correlation. Inset: the results for both con- 
centrations are plotted with respect to each other in order to check the cross-correlation (Pearson 
coefficient between the two concentrations: 0.983). 



with a fluorophore. The spacer keeps the fluorophore away from the hybridization area and from 
the probe layer. Twelve probe sequences were associated to each target: one perfect match, 3 
carry one mismatch, 7 carry two mismatches and one has 3 mismatches. This sums up to 48 data 
points. Each probe was replicated 12 times on the array, the analyzed signal is the median over the 
replicates. 

The experimental results are shown in Fig. 8. The data are plotted as a function of AG so i 
calculated from the nearest-neighbor parameters for hybridization in solution by. 26 The horizontal 
line corresponds to the maximum intensity detectable from the scanner (optical saturation). The 
steep line has a slope equal to l/RT exp . Most probes with low AG are distributed in the close 
vicinity of the l/^r exp line. This suggests that the system had attained thermodynamic equilibrium. 
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Although the number of data point is insufficient to extract free energy parameters, the alignment of 
the majority of the data points along a line with slope 1 /RT exp suggests a good degree of correlation 
with the hybridization free energy parameters measured in aqueous solution. 26 
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Figure 8: Background subtracted intensities vs. hybridization free energies in solution for the 
experiments of Weckx et al. 9 on custom Codelink activated slides. The data are generally in 
agreement with equilibrium thermodynamics as seen from the alignement along a line with slope 
l/RT sxp . The experimental temperature is T = 45°C for a hybridization time of I2h. 



In Codelink activated slides the probes are not directly bound to the solid surface, but to a three 
dimensional polymeric coating which shifts them further away from the surface. This is to reduce 
the steric hindrance and to facilitate hybridization. The reduced surface probe density will also 
reduce electrostatic interactions from the negatively charged DNA probes. This is probably why 
these systems attain thermal equilibrium faster than those presented above. 
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Conclusion 

Although several papers have discussed the application of physico-chemical models to describe 
microarray data, so far most works focused on equilibrium thermodynamics. 2 " 12 The tacit as- 
sumption behind the use of these models is that hybridization in typical microarray experiments is 
sufficiently long (usually 15 — 17/z), so that thermal equilibrium can be considered to be attained. 
However, in standard microarrays, which are based on fluorescence detection, it is not possible to 
perform real time measurements of the hybridization signal, then, it is difficult to test this hypoth- 
esis directly from experimental data. 

In this paper we discussed how the attainment of equilibrium can be checked in experiments. 
We analyzed experiments from three different microarray platforms where the number of mis- 
matches per probe varied from to 3. This is an experimental setup which can offer very interest- 
ing insights about the physical properties of hybridization. First of all, given a sufficiently large 
number of mismatches, one can cover a broad range of intensities in an experiment even with a 
single hybridizing sequence, avoiding thus interactions between different targets in solution. Sec- 
ond, the attainment of thermodynamic equilibrium can be tested from an analysis of the slope of 
the experimental data in a log I vs. —AG plot. We expect a slope equal to l/RT exp for equilibrium 
data, whereas a deviation to a different slope can be taken as an indication of the breakdown of 
equilibrium (sufficiently far from chemical saturation). We note also that in high density arrays 
used in biological experiments, the slope of log/ vs. —AG does not usually match the experi- 
mental temperature, 2 ' 7 which could be a signature of nonequilibrium behavior, although different 
explanations exist. 28 

To explain the observed experimental results we discussed a three-state model of hybridiza- 
tion. This is an intermediate between the unbound state and the fully hybridized state. For finite 
hybridization times, the three-state model predicts that sufficiently weakly bound targets are in 
equilibrium, while nonequilibrium effects are expected to occur for higher binding affinities. This 
feature is generally observed in experiments. In the limit of strong binding affinities the model 
predicts the existence of a dynamical saturation limit, in which the hybridization signal becomes 
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independent on AG. These features are also observed in experiments on different platforms. 

We expect that the absence of thermal equilibrium has important practical consequences for 
the functioning of the microarrays also in biological experiments with complex mixture of target 
sequences from mRNAs extracts. In that case different target sequences "compete" for hybridiza- 
tion to the same probe. Let us consider the case of two competing targets, one perfect matching 
to the probe and one carrying mismatches. The hybridization of the probe with the latter produces 
a so-called cross -hybridization signal. Assuming that the concentration of the targets is low, the 
following model is expected to hold: 

9 = cf(AG)+c'f(AG') (10) 

where c is the concentration of the perfect matching target and AG its hybridization free energy; c' 
and AG' refer to the mismatched target. The function /() describes the isotherm. In thermal equi- 
librium f(x) = e~ x ' RTex v. A high specificity, which is the desired working condition, corresponds 
to a perfect match signal dominating over the cross-hybridization contribution. This is obtained 
when the ratio /(AG) //(AG') ^> 1, being c and c' fixed in an experiment. This ratio is maximal 
in thermodynamic equilibrium. If, for instance, AG' — AG = 3kcal/mol (typically the free energy 
difference of one mismatch) at equilibrium one has /(AG) //(AG') = exp((AG' — AG)/RT exp ) ps 
90. In the nonequilibrium regime characterized by an effective temperature T t Q = 3T exp one has 
/(AG) //(AG') = exp((AG' — AG)/7?r e ff) w 4. Thus the thermodynamic equilibrium corresponds 
to the highest specificity. 
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Appendix 1: The effect of salt concentration 

Isotherms different from the Langmuir model were discussed in the microarray literatures of the 
last decade. 27 Particular attention was devoted to the the effect of electrostatic interactions, which 
arise during hybridization between a target molecule and a dense probe layer. As DNA is negatively 
charged, an additional electrostatic repulsive force may arise when a target molecule approaches 
the probes at the microarray surface. 

In the context of a mean- field approximation Vainrub and Pettit, 27 Halperin Buhot and Zhulina 29 
derived the following equilibrium model 

0e q - e -^G/RT-T{\+e, q )/RT n i x 

c(i-etq) l j 

where, as in the main text, e q is the fraction of hybridized probes, c the target concentration, AG 
the hybridization free energy between a single probe and a single target. The term T(l + 6 eq )/RT 
accounts for the electrostatic repulsion, where T is a constant independant of the sequence. Note 
that if r = one recovers from Eq.(l 1) the Langmuir isotherm of Eq.(2). At high salt concen- 
trations, electrostatic interactions are screened. Equation (11) shows that at larger coverages (0 eq 
large) the electrostatic effects increase, reducing target-probe binding affinity. 

The experiments discussed in this paper corresponds to the low concentration limit, as the 
measured intensities in all cases results proportional to the global target concentration c. In the 
limit c — > 0, Eq.(l 1) becomes 

esq = ce -±G/RT-r/RT + (12) 

where the dots indicate higher orders in c. Compared to the limit obtained from the Langmuir 
model, the electrostatic effects provides an additional contribution to the hybridization free energy 
(T). In the mean-field model discussed in the literature, Y is proportional to the charge density 
of the unhybridized layer, to its thickness and to the length of the target DNA. It is however in- 



23 



dependent from the sequence composition. T is therefore the same for hybridization to a perfect 
matching probe or to a probe with one or two mismatches. In conclusion, in the low concentra- 
tion limit, according to the above model, electrostatic effects can cause a uniform shift of the free 
energy scale, compared to hybridization free energies in solution. 

Appendix 2: The concentration scaling 

As discussed throughout the manuscript all experiments shown are performed in a regime far from 
the chemical saturation limit which corresponds to — > 1. In the limit <C 1, and thus the 
measured intensity, is proportional to the target concentration c. We show this explicitly here 
for one set of experiments on Custom Agilent arrays. Figure 9 shows the same data as 3 (a,b) 
(Lprobe — ^target = 30, T = 65°C) for different concentrations c = {2, 10, 50, 250}pM from bottom 
to top. The different curves clearly display an overall linear scaling relationship with respect to the 
concentration, both in the weak (equilibrium) and strong (nonequilibrium) binding regimes. The 
fivefold increase in the concentration correspond to a fivefold increase of the fluorescence intensity. 
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Figure 9: Data obtained with Agilent platforms for a target and probe length of 30 nucleotides at 
65°C with various concentrations c = {2, 10,50, 250} pM from bottom to top (data from 16 ). The 
linear scaling in concentration for the different curves means that the kink does not correspond to 
the approach toward chemical saturation where the linearity does not hold anymore. 



References 

(1) V. A. Bloomfield, D. M. Crothers, and I. Tinoco, Jr., Nucleic Acids Structures, Properties and 
Functions (University Science Books, Mill Valley, 2000). 

(2) G. A. Held, G. Grinstein, and Y. Tu, Proc. Natl. Acad. Sci. 100, 7575 (2003). 

(3) D. Hekstra, A. R. Taussig, M. Magnasco, and F. Naef, Nucleic Acids Res. 31, 1962 (2003). 

(4) L. Zhang, M. F Miles, and K. D. Aldape, Nature Biotech. 21, 818 (2003). 

(5) M. F Hagan and A. K. Chakraborty, J. Chem. Phys. 120, 4958 (2004). 

(6) H. Binder, S. Preibisch, and T Kirsten, Langmuir 21, 9287 (2005). 

(7) E. Carlon and T Heim, Physica A 362, 433 (2006). 



25 



(8) D. J. Fish et al., Nucleic Acids Res. 35, 7197 (2007). 

(9) S. Weckx, E. Carlon, L. De Vuyst, and P. Van Hummelen, J. Phys. Chem. B 111, 13583 
(2007). 

(10) L. Zhang, C. Wu, R. Carta, and H. Zhao, Nucleic Acids Res 35, el8 (2007). 

(11) C. J. Burden, Phys Biol 5, 16004 (2008). 

(12) T. Naiser, J. Kayser, T. Mai, W. Michel, and A. Ott, BMC Bioinformatics 9, 509 (2008). 

(13) A. Halperin, A. Buhot, and E. B. Zhulina, J. Phys. Cond. Matt. 18, S463 (2006). 

(14) R. Levicky and A. Horgan, Trends Biotechnol. 23, 143 (2005). 

(15) H. Binder, J. Phys.: Condens. Matt. 18, S491 (2006). 

(16) J. Hooyberghs, M. Baiesi, A. Ferrantini, and E. Carlon, Phys Rev E 81, 012901 (2010). 

(17) M. Glazer et al., Anal. Biochem. 358, 225 (2006). 

(18) C. R. Cantor and P. R. Schimmel, Biophysical Chemistry, part HI: The behavior of biological 
Macromolecules (W. H. Freeman, New York, 1980). 

(19) J. SantaLucia Jr., Proc. Natl. Acad. Sci. 95, 1460 (1998). 

(20) J. Hooyberghs, P. Van Hummelen, and E. Carlon, Nucleic Acids Res. 37, e53 (2009). 

(21) S. Suzuki, N. Ono, C. Furusawa, A. Kashiwagi, and T. Yomo, BMC Genomics 8, 373 (2007). 

(22) J. SantaLucia, Jr. and D. Hicks, Annu. Rev. Biophys. Biomol. Struct. 33, 415 (2004). 

(23) N. Ono et al., Bioinformatics 24, 1278 (2008). 

(24) C. J. Burden and H. Binder, Phys Biol 7, 016004 (2010). 

(25) D. M. Gray, Biopolymers 42, 783 (1997). 

26 



(26) N. Peyret et al, Biochemistry 38, 3468 (1999). 

(27) A. Vainrub, and M. Pettitt, Phys. Rev. E 66, 041905 (2002). 

(28) T. Naiser, J. Kayser, T. Mai, W. Michel, and A. Ott, Phys. Rev. Lett. 102, 218301 (2009). 

(29) A. Halperin, A. Buhot, and E. Zhulina, Clin. Chem. 50, 2254 (2004). 



27 



